Exploiting Cluster Analysis for Constructing Multi-dimensional Histograms on Both Static and Evolving Data

نویسندگان

  • Filippo Furfaro
  • Giuseppe M. Mazzeo
  • Cristina Sirangelo
چکیده

Density-based clusterization techniques are investigated as a basis for constructing histograms in multi-dimensional scenarios, where traditional techniques fail in providing effective data synopses. The main idea is that locating dense and sparse regions can be exploited to partition the data into homogeneous buckets, preventing dense and sparse regions from being summarized into the same aggregate data. The use of clustering techniques to support the histogram construction is investigated in the context of either static and dynamic data, where the use of incremental clustering strategies is mandatory due to the inefficiency of performing the clusterization task from scratch at each data update.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Constructing Two-Dimensional Multi-Wavelet for Solving Two-Dimensional Fredholm Integral Equations

In this paper, a two-dimensional multi-wavelet is constructed in terms of Chebyshev polynomials. The constructed multi-wavelet is an orthonormal basis for space. By discretizing two-dimensional Fredholm integral equation reduce to a algebraic system. The obtained system is solved by the Galerkin method in the subspace of by using two-dimensional multi-wavelet bases. Because the bases of subs...

متن کامل

Efficient Selectivity Estimation by Histogram Construction Based on Subspace Clustering

Modern databases have to cope with multi-dimensional queries. For efficient processing of these queries, query optimization relies on multi-dimensional selectivity estimation techniques. These techniques in turn typically rely on histograms. A core challenge of histogram construction is the detection of regions with a density higher than the ones of their surroundings. In this paper, we show th...

متن کامل

Image retrieval using color histograms generated by Gauss mixture vector quantization

Image retrieval based on color histograms requires quantization of a color space. Uniform scalar quantization of each color channel is a popular method for the reduction of histogram dimensionality. With this method, however, no spatial information among pixels is considered in constructing the histograms. Vector quantization (VQ) provides a simple and effective means for exploiting spatial inf...

متن کامل

Methods for regression analysis in high-dimensional data

By evolving science, knowledge and technology, new and precise methods for measuring, collecting and recording information have been innovated, which have resulted in the appearance and development of high-dimensional data. The high-dimensional data set, i.e., a data set in which the number of explanatory variables is much larger than the number of observations, cannot be easily analyzed by ...

متن کامل

Combining Histograms and Parametric Curve Fitting for Feedback-Driven Query Result-size Estimation

This paper aims to improve the accuracy of query result-size estimations in query optimizers by leveraging the dynamic feedback obtained from observations on the executed query workload. To this end, an approximate \synopsis" of data-value distributions is devised that combines histograms with parametric curve tting, leading to a speci c class of linear splines. The approach reconciles the bene...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006